Generalized bucketization scheme for flexible privacy settings
نویسندگان
چکیده
Bucketization is an anonymization technique for publishing sensitive data. The idea is to group records into small buckets to obscure the record-level association between sensitive information and identifying information. Compared to the traditional generalization technique, bucketization does not require a taxonomy of attribute values, so is applicable to more data sets. A drawback of previous bucketization schemes is the uniform privacy setting and uniform bucket size, which often results in a non-achievable privacy goal or excessive information loss if sensitive values have variable sensitivity. In this work, we present a flexible bucketization scheme to address these issues. In the flexible scheme, each sensitive value can have its own privacy setting and buckets of different sizes can be formed. The challenge is to determine proper bucket sizes and group sensitive values into buckets so that the privacy setting of each sensitive value can be satisfied and overall information loss is minimized. We define the bucket setting problem to formalize this requirement. We present two efficient solutions to this problem. The first solution is optimal under the assumption that two different bucket sizes are allowed, and the second solution is heuristic without this assumption. We experimentally evaluate the effectiveness of this generalized bucketization scheme. © 2016 Elsevier Inc. All rights reserved.
منابع مشابه
Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique
Privacy Preserving is one of the significant methods in data mining to hide the sensitive information. Anonymization techniques like generalization and bucketization have been used for privacy preserving. The main problem with generalization is it is not applicable for high-dimensional data and bucketization technique does not avoid membership disclosure. Slicing is one of the novel techniques ...
متن کاملEfficient Techniques for Preserving Microdata Using Slicing
Privacy preserving publishing is the kind of techniques to apply privacy to collected vast amount of data. One of the recent problem prevailing is in the field of data publication. The data often consist of personally identifiable information so releasing such data consists of privacy problem. Several anonymization techniques such as generalization and bucketization have been designed for priva...
متن کاملSegmenting: A New-Fangled Advance to Isolation Conserving Facts Distributing
Re-identification is a major privacy threat to public datasets containing individual records. Many privacy protection algorithms rely on generalization and suppression of “quasiidentifier" attributes such as ZIP code and birthdate. Several anonymization techniques, such as generalization and bucketization, have been designed for privacy preserving micro data publishing. Recent work has shown th...
متن کاملA Survey on Privacy Preservation in Data Publishing
Abstract— Privacy preservation is the most concentrated issue in data publishing, as the sensitive information should not be leaked. For this sake, several techniques such as generalization, bucketization are proposed, in order to deal with privacy preservation. However, generalization fails on high dimensional data because of dimensionality and it causes information loss due to uniform distrib...
متن کاملA centralized privacy-preserving framework for online social networks
There are some critical privacy concerns in the current online social networks (OSNs). Users' information is disclosed to different entities that they were not supposed to access. Furthermore, the notion of friendship is inadequate in OSNs since the degree of social relationships between users dynamically changes over the time. Additionally, users may define similar privacy settings for their f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Sci.
دوره 348 شماره
صفحات -
تاریخ انتشار 2016